skip to main content


Search for: All records

Creators/Authors contains: "Beesley, Lauren J."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Electronic health records (EHR) are not designed for population‐based research, but they provide easy and quick access to longitudinal health information for a large number of individuals. Many statistical methods have been proposed to account for selection bias, missing data, phenotyping errors, or other problems that arise in EHR data analysis. However, addressing multiple sources of bias simultaneously is challenging. We developed a methodological framework (R package,SAMBA) for jointly handling both selection bias and phenotype misclassification in the EHR setting that leverages external data sources. These methods assume factors related to selection and misclassification are fully observed, but these factors may be poorly understood and partially observed in practice. As a follow‐up to the methodological work, we demonstrate how to apply these methods for two real‐world case studies, and we evaluate their performance. In both examples, we use individual patient‐level data collected through the University of Michigan Health System and various external population‐based data sources. In case study (a), we explore the impact of these methods on estimated associations between gender and cancer diagnosis. In case study (b), we compare corrected associations between previously identified genetic loci and age‐related macular degeneration with gold standard external summary estimates. These case studies illustrate how to utilize diverse auxiliary information to achieve less biased inference in EHR‐based research.

     
    more » « less
  2. Abstract

    Susceptible-Exposed-Infected-Removed (SEIR)-type epidemiologic models, modeling unascertained infections latently, can predict unreported cases and deaths assuming perfect testing. We apply a method we developed to account for the high false negative rates of diagnostic RT-PCR tests for detecting an active SARS-CoV-2 infection in a classic SEIR model. The number of unascertained cases and false negatives being unobservable in a real study, population-based serosurveys can help validate model projections. Applying our method to training data from Delhi, India, during March 15–June 30, 2020, we estimate the underreporting factor for cases at 34–53 (deaths: 8–13) on July 10, 2020, largely consistent with the findings of the first round of serosurveys for Delhi (done during June 27–July 10, 2020) with an estimated 22.86% IgG antibody prevalence, yielding estimated underreporting factors of 30–42 for cases. Together, these imply approximately 96–98% cases in Delhi remained unreported (July 10, 2020). Updated calculations using training data during March 15-December 31, 2020 yield estimated underreporting factor for cases at 13–22 (deaths: 3–7) on January 23, 2021, which are again consistent with the latest (fifth) round of serosurveys for Delhi (done during January 15–23, 2021) with an estimated 56.13% IgG antibody prevalence, yielding an estimated range for the underreporting factor for cases at 17–21. Together, these updated estimates imply approximately 92–96% cases in Delhi remained unreported (January 23, 2021). Such model-based estimates, updated with latest data, provide a viable alternative to repeated resource-intensive serosurveys for tracking unreported cases and deaths and gauging the true extent of the pandemic.

     
    more » « less
  3. False negative rates of severe acute respiratory coronavirus 2 diagnostic tests, together with selection bias due to prioritized testing can result in inaccurate modeling of COVID‐19 transmission dynamics based on reported “case” counts. We propose an extension of the widely used Susceptible‐Exposed‐Infected‐Removed (SEIR) model that accounts for misclassification error and selection bias, and derive an analytic expression for the basic reproduction number as a function of false negative rates of the diagnostic tests and selection probabilities for getting tested. Analyzing data from the first two waves of the pandemic in India, we show that correcting for misclassification and selection leads to more accurate prediction in a test sample. We provide estimates of undetected infections and deaths between April 1, 2020 and August 31, 2021. At the end of the first wave in India, the estimated under‐reporting factor for cases was at 11.1 (95% CI: 10.7,11.5) and for deaths at 3.58 (95% CI: 3.5,3.66) as of February 1, 2021, while they change to 19.2 (95% CI: 17.9, 19.9) and 4.55 (95% CI: 4.32, 4.68) as of July 1, 2021. Equivalently, 9.0% (95% CI: 8.7%, 9.3%) and 5.2% (95% CI: 5.0%, 5.6%) of total estimated infections were reported on these two dates, while 27.9% (95% CI: 27.3%, 28.6%) and 22% (95% CI: 21.4%, 23.1%) of estimated total deaths were reported. Extensive simulation studies demonstrate the effect of misclassification and selection on estimation of and prediction of future infections. A R‐packageSEIRfansyis developed for broader dissemination.

     
    more » « less
  4. Large‐scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well‐characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease‐gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest givensummary resultsfrom standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease‐gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.

     
    more » « less
  5. Biobanks linked to electronic health records provide rich resources for health‐related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large‐scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis‐generating studies of disease‐treatment, disease‐exposure, and disease‐gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank‐based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank‐based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.

     
    more » « less